a proposed unicode-based extended romanization system for persian texts
نویسندگان
چکیده
so far, various romanization schemes have been proposed for capturing persian text using latin alphabet. however, each have served a very specific and yet limited function. this paper proposes an extended romanization scheme that can facilitate a wide range of encoding needed in the field of natural language processing. the proposed scheme endeavors to preserve both orthographic and phonological phenomena in the language. it also accounts for encoding handwritten manuscripts, in which glyph ambiguity is a salient feature. it is particularly relevant to romanizing the kufi script, in which diacritical marks are omitted. the current work also recommends orthographic rules in an effort to standardize future romanization tasks.
منابع مشابه
A Revised Unicode based Sorting Algorithm for Bengali Texts
This paper describes a sorting algorithm for Bengali texts which is one of the most vital tasks for Bengali Natural Language Processing. As Unicode is much more preferable than ASCII encoding, we need to use this representation for Bengali Language. But due to some distinct properties of Bengali Language, they cannot be sorted directly using the order in Unicode character scheme. A few works ha...
متن کاملA Plagiarism Detection Approach Based on SVM for Persian Texts
Plagiarism is defined as an unauthorized act of using or adapting others’ works and ideas without referring to them. Numerous methods have been proposed to detect plagiarism in different languages; however, not a lot has been accomplished in Persian. The present study has utilized statistical and semantic features to determine the functionality of Support Vector Machines (SVMs) in detecting act...
متن کاملProposed Update Unicode Technical Report
Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks. This is especially important as more and more products are internationalized. This document describes some of the security considerations that programmers, system analysts, standards developers, and user...
متن کاملRumi Numeral System Symbols, Additional characters proposed to Unicode
A special numeral system rumi has been in use in North Africa since the Xe century. It remained in use until the XVIIe century. This system has been especially used in the administration of the city of Fez in Morocco. It has also been used in Al-Andalusians, Spain, starting from the XIIe century. The forms of the digits are quiet di erent from the Arabic or the Arabic-Indic digits in use today....
متن کاملa corpus-based study of the frequency of personal pronouns in translated and comparable non-translated persian texts
چکیده ندارد.
15 صفحه اولdeveloping a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”
هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...
15 صفحه اولمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
international journal of information science and managementجلد ۱۰، شماره ۱، صفحات ۵۷-۷۱
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023